指示から模倣へ：文脈内学習の仕組み

このモジュールでは、重みベースのファインチューニングという従来のパラダイムから、 文脈内学習（ICL）動的な世界へと進みます。大規模言語モデル（LLM）が内部構造を変更せずに、プロンプトの構造を活用して複雑な潜在空間を航行することで、タスクの熟達を達成する仕組みを探ります。

1. 説明から提示へ

指示は一般的な方向性を提供しますが、「入力-出力ペア」$(x, y)$による「模倣」は非パラメトリックなガイドとして機能します。これらの例は統計的なアンカーとなり、モデルの確率分布を狭め、自然言語の指示に内在する曖昧さを低減します。

2. アテンションの仕組み

ICLは、トランスフォーマーのアテンション機構を使って「タスク誘導」を行います。提示されたシーケンス内の規則性を識別することで、モデルは高次元空間内の特定の関数的マッピングを見つけ出し、スタイルや構造を高い精度で模倣できるようになります。

ICLのパターンテンプレート

[文脈／指示]: 「以下の専門用語を、専門用語を使わない一般向けの言葉に翻訳してください。」 [例1]: 「入力：潜在空間 | 出力：AIが概念を格納する隠れた数学的マップ。」 [例2]: 「入力：トランスフォーマー | 出力：文章内の異なる単語の重要性を重みづけするAIアーキテクチャ。」 [テスト入力]: 「入力：文脈内学習 | 出力：」

Type a message... (Disabled in Demo Mode)

Mechanics Check

Mechanically speaking, what is the primary role of providing $(x, y)$ pairs in a prompt?

To retrain the model's neural weights for a specific task.

To act as anchors that resolve ambiguity and narrow the prediction distribution.

To increase the model's processing speed by reducing sequence length.

To bypass the attention mechanism entirely.

Challenge: From Instruction to Imitation

Imitation Mastery

Vague Instruction: "Rewrite these emails to be professional."

Goal: Provide a three-exemplar few-shot prompt that teaches the model a specific "Concise Executive" style, rather than just a generic professional tone.

Analysis

Why is providing specific examples more effective than simply adding the adjective "Concise" to the instruction?

Solution:
Adjectives like "Concise" are subjective and have broad probability distributions; examples provide a concrete structural template that the attention mechanism can emulate with mathematical precision.